FiM++ Wiki:Proposals/Compiler
= Kyli Rouge = .FR file A .FR file (abbreviation for "F'riendship '''R'eport") is proprietary FiM++ bytecode and must be read and executed by a virtual machine. * Whitespace and Comments are entirely removed (except for programmer name) * Keywords are represented by Unicode characters starting at , which represent their function, not the actual used keyword (any two synonyms are compiled to the same character). ** Binary prefix operators which have a partner infix operator (think add 12 and 2) are converted to a single infix operator (think 12+2) * Variables, class names, and method names are compiled into hex digits ** surrounded by the Unicode character * Literals are kept as-is, with any source quotes removed ** Booleans are preceded by ** Numbers are preceded by ** Characters are preceded by ** Strings are surrounded by ** Replace any instance of in character literals or in String literals with . This is a weak point, as it leaves two characters being represented the same way. * Punctuation is compiled into Unicode representations of Phrases ( ): CLASS ( ): END_CLASS ( ): IMPORT ( ): IMPLEMENTS ( ): METHOD ( ): MANE_METHOD ( ): END_METHOD ( ): RETURN_TYPE ( ): PARAMETERS ( ): RETURN ( ): REACALL ( ): VARIABLE ( ): BOOL ( ): BOOL_ARRAY ( ): CHARACTER ( ): CHARACTER_ARRAY ( ): CHARACTER_ARRAY_ARRAY ( ): NUMBER ( ): NUMBER_ARRAY ( ): ASSIGN ( ): ASSIGN_CONSTANT ( ): REASSIGN ( ): IF ( ): IF_PARTNER_SUF ( ): END_IF ( ): ELSE ( ): END_ELSE ( ): SWITCH ( ): CASE ( ): CASE_PARTNER_POST ( ): DEFAULT ( ): WHILE ( ): END_WHILE (!): DO_WHILE ("): END_DO_WHILE (#): PRINT ($): PROMPT (%): READ (&): ADD_IN ('): ADD_PRE ((): ADD_PRE_ARTNER_IN ()): DIVIDE_IN (*): DIVIDE_PRE (+): DIVIDE_PRE_PARTNER_IN (,): MULTIPLY_IN (-): MULTIPLY_PRE (.): MULTIPLY_PRE_PARTNER_IN (/): SUBTRACT_IN (0): SUBTRACT_PRE (1): SUBTRACT_PRE_PARTNER_IN (2): DECREMENT (3): INCREMENT (4): AND (5): OR (6): XOR (7): XOR_PARTNER_IN (8): NOT (9): EQUAL (:): NOT_EQUAL (;): GREATER_THAN (<): GREATER_THAN_OR_EQUAL (=): LESS_THAN (>): LESS_THAN_OR_EQUAL (?): NOTHING (@): TRUE (A): FALSE Example Hello World.FPP (190B) }} Would compile into: Hello World.FR (93B (51% compression)); click to view, as Wikia won't let special characters on the site Interpretation steps Hello World.FPP (190B) # Read in original code: #* }} # Remove comments (except the special programmer name comment) and unnecessary whitespace: #* }} # Replace phrases and punctuation with generics: #* }} # Surround literals with special Unicode characters, removing quotes: #* }} # Replace class, method, and variable names with numbers: #* }} # Remove remaining whitespace: #* }} # Replace generic phrases and punctuations with special Unicode characters: #* }} }} }} }} }} }} }} }} }} }} }}}} Hello World.FR (93B) = UNiTY (Mattia Borgo) = I was thinking of a more '''easy to execute bytecode, much like a VM. Compiling Format: .fb and .fba files .fb files (F'''iM++ '''b'ytecode) is another language by itself, put under FiM++ and used in a VM.'' .fba files (F'''iM++ '''b'ytecode a'rchive) are libraries of classes. Format Specification .fb The first character of a .fb file is always ú (0xfb). After it, the author's name, in a Pascal-like manner (lentgh of string (byte), then string). Then the compiled bytecode. Instructions * Class declaration has a different type of instruction: in the list above its argument is marked with a '''* (single asterisk). To make things clear, I'll start with an example. }} directly translates into , which needs a word to define its superclass, than a byte specifying the number of interfaces and their respective words. We'll assume and have interface numbers 1 and 2, respectively; will be class slot 1. Coding by-hand the example gives , : , }} This translates in the following bytes: 0x01 0x00 0x01 0x02 0x00 0x01 0x00 0x02 ** Many instruction have a a ** (double asterisk) next to their arguments. It essentially means that the argument is a runtime-calculated value. It follows the syntax described below (Values and Operators). Example: }} This translates to: , : }} *** Method declaration has a different type of instruction: in the list above its argument is marked with a *** (triple asterisk). To make things clear, I'll start with an example. }} directly translates into , which needs a byte defining the return value for the method and a byte specifying the number of arguments in input plus 1. All arguments' types are specified. Coding by-hand the example gives , : }} This translates in the following bytes: 0x10 0x4f 0x00 0x00 0x5f Values and Operators * Numbers: 64-bit integers, little endian. * Booleans: either 0x00 or 0xff, FALSE and TRUE, respectively. * Characters: 1 Unicode character. * Variables: 1 double word pointer. * Methods: 1 word pointer to class, 1 word pointer to method, arguments. * Arrays in a Pascal-like manner. Examples Hello World program: }} Compiled language: : }} Compiled bytecode: 0xfb 0x0a 0x4b 0x79 0x6c 0x69 0x20 0x52 0x6f 0x75 0x67 0x65 0x02 0x11 0x80 0x60 0x00 0x0d 0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x57 0x6f 0x72 0x6c 0x64 0x21 0x14 0x04 or, in one dump: fb0a4b796c6920526f75676502118060000d48656c6c6f2c20576f726c64211404 The compiled bytecode's lentgh is 33B, with a compression ratio of 81%.